Compact Lexicon Selection with Spectral Methods

نویسندگان

  • Young-Bum Kim
  • Karl Stratos
  • Xiaohu Liu
  • Ruhi Sarikaya
چکیده

In this paper, we introduce the task of selecting compact lexicon from large, noisy gazetteers. This scenario arises often in practice, in particular spoken language understanding (SLU). We propose a simple and effective solution based on matrix decomposition techniques: canonical correlation analysis (CCA) and rank-revealing QR (RRQR) factorization. CCA is first used to derive low-dimensional gazetteer embeddings from domain-specific search logs. Then RRQR is used to find a subset of these embeddings whose span approximates the entire lexicon space. Experiments on slot tagging show that our method yields a small set of lexicon entities with average relative error reduction of > 50% over randomly selected lexicon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Underspecified Phonological Features for Lexical Access

The FUL (featurally underspecified lexicon) system of automatic speech recognition is based on the representation of words in the lexicon with underspecified distinctive features. The speech signal is converted from the waveform into an online spectral representation made up of LPC formants and a few parameters describing the overall spectral shape. These spectral parameters are converted into ...

متن کامل

The Effect of Lexicon-based Debates on the Felicity of Lexical Equivalents in Translating Literary Texts by Iranian EFL Learners

This study was an attempt to investigate the effect of lexicon-based debates on the felicity of lexical equivalents in translating literary texts by Iranian EFL learners.  To fulfill the purpose of this study, 59 university students, majoring in English Translation, were randomly assigned to the experimental and control groups from a total of 73 students based on their performance on a mock TOE...

متن کامل

Higher Derivations Associated with the Cauchy-Jensen Type Mapping

Let H be an infinite--dimensional Hilbert space and K(H) be the set of all compact operators on H. We will adopt spectral theorem for compact self-adjoint operators, to investigate of higher derivation and higher Jordan derivation on K(H) associated with the following cauchy-Jencen type functional equation 2f(frac{T+S}{2}+R)=f(T)+f(S)+2f(R) for all T,S,Rin K(H).

متن کامل

Learning Compact Lexicons for CCG Semantic Parsing

We present methods to control the lexicon size when learning a Combinatory Categorial Grammar semantic parser. Existing methods incrementally expand the lexicon by greedily adding entries, considering a single training datapoint at a time. We propose using corpus-level statistics for lexicon learning decisions. We introduce voting to globally consider adding entries to the lexicon, and pruning ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015